contextual memory
AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory
Jain, Jitesh, Maheshwari, Shubham, Yu, Ning, Hwu, Wen-mei, Shi, Humphrey
Riding on the success of LLMs with retrieval-augmented generation (RAG), there has been a growing interest in augmenting agent systems with external memory databases. However, the existing systems focus on storing text information in their memory, ignoring the importance of multimodal signals. Motivated by the multimodal nature of human memory, we present AUGUSTUS, a multimodal agent system aligned with the ideas of human memory in cognitive science. Technically, our system consists of 4 stages connected in a loop: (i) encode: understanding the inputs; (ii) store in memory: saving important information; (iii) retrieve: searching for relevant context from memory; and (iv) act: perform the task. Unlike existing systems that use vector databases, we propose conceptualizing information into semantic tags and associating the tags with their context to store them in a graph-structured multimodal contextual memory for efficient concept-driven retrieval. Our system outperforms the traditional multimodal RAG approach while being 3.5 times faster for ImageNet classification and outperforming MemGPT on the MSC benchmark.
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Zhang, Dianxing, Li, Wendong, Song, Kani, Lu, Jiaye, Li, Gang, Yang, Liuchun, Li, Sheng
Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.
Lucia: A Temporal Computing Platform for Contextual Intelligence
Project Aria (Engel et al., 2023), Meta's all-day These models exhibit an unprecedented ability wearable AR glasses developed as data collection to understand and generate human-like language, tools for spatial computing. While Project Aria process visual and auditory information, and interpret aims to shift computing paradigms by blending digital 3D spatial environments (Zhao et al., 2023; interactions into the 3D world through spatial Yin et al., 2023; Engel et al., 2023). However, computing, Lucia extends these ideas by emphasizing as we push the boundaries of AI, a new frontier the temporal dimension. It prioritizes the emerges: Temporal Computing--the understanding continuous capture and intelligent interpretation of and utilization of time to construct contextual user activities over time while enhancing practical memory that enhances human cognition. This evolution usability: Lucia creates a device that not only has paved the way for devices that are not records but also understands and provides insightful only intelligent but also temporally aware, deeply responses based on the user's temporal expe-1
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Kagaya, Tomoyuki, Yuan, Thong Jing, Lou, Yuxuan, Karlekar, Jayashree, Pranata, Sugiri, Kinose, Akira, Oguri, Koki, Wick, Felix, You, Yang
Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.